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ABSTRACT 

Context. Infrared spectroscopy of primary and secondary eclipse events probes the composition of exoplanet atmospheres and, using 
space telescopes, has detected ITO, CH4 and CO2 in three hot Jupiters. However, the available data from space telescopes has limited 
spectral resolution and does not cover the 2.4 - 5.2yum spectral region. While large ground based telescopes have the potential to 
obtain molecular-abundance-grade spectra for many exoplanets, realizing this potential requires retrieving the astrophysical signal in 
the presence of large Earth-atmospheric and instrument systematic errors. 

Aims. Here we report a wavelet-assisted, selective principal component extraction method for ground based retrieval of the dayside 
spectrum of HD 189733b from data containing systematic errors. 

Methods. The method uses singular value decomposition and extracts those critical points of the Rayleigh quotient which correspond 
to the planet induced signal. The method does not require prior knowledge of the planet spectrum or the physical mechanisms causing 
systematic errors. 

Results. The spectrum obtained with our method is in excellent agreement with space based measurements made with HST and 
Spitzer (Swain et al. 2009b; Charbonneau et al. 2008) and confirms the recent ground based measurements (Swain et al. 2010) 
including the strong ~ 3.3 fim emission. 
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1. Introduction 



Detection of molecules in exoplanet atmospheres via in- 
frared spectroscopy from space-based tel e scopes is now routin e 
(ISwain et all 120081 iGrillmair et all 120081: ISwain etail l2009bfll 
Tinet ti et al.ll2010h . Recently. ISwain et alJ (12010 ) demonstrated 
molecular spectroscopy of an exoplanet atmosphere with 
ground-based measurements. With the availability of numer- 
ous large ground based telescopes equipped with infrared spec- 
trometers, there is a great potential to obtain a large quantity 
of "molecular-abundance-grade" spectra; realizing this poten- 
tial requires developing optimal signal extraction algorithms to 
retrieve the spectral signature of an exoplanet atmosphere in 
the presence of large Earth-atmospheric and instrument sys- 
tematic errors. Here we present a method based on Principal 
Component Analysis (PCA) capable of detecting the exoplanet 
emission spectrum from the ground. PCA is a well established 
method with numerous astronomomical applications; it has been 
used to search for a n exoplanet signal i n ground-based spectro- 
scopic observations dBrown et alJ l2002) and a relat ed method is 
used in SysRem for exoplanet eclipse detections (Tamu zet alJ 
120051) . We apply our PCA based method to extract the dayside 
emission spectrum of HD 189733b in the K and L bands and 
we c ompare the resultin g spectrum with previously reported re- 
sults dSwain et al. I l2oioh . which were obtained using a different 
method. 



2. Observations and Initial Calibration 

The spectru m presented here is based on the same observa- 
tions used in lSwain et alJ (1201 Oh but analyzed using the Selective 
Principal Component Extraction and Reconstruction (SPCER) 
method described in Sect. 3. A secondary eclipse of HD 189733b 
was observed on August 11 th , 2007 using the SpeX instrument 
mounted on the NASA Infrared Telescope Facility (IRTF). The 
spectroscopic time series begins approximately one hour before 
the onset of ingress and ends approximately one hour after the 
termination of egress. The details of the observations and th e 
reduction with SpexTool are presented in Swa in et all {2010). 
The result of the standard SpexTool calibration is a flux density 
time series with ~ 4 % variations. We then employ the initial 
ca libration step (nor malization & airmass correction) outlined 
in lSwain et alJ (120101) . Finally, we separate the exoplanet eclipse 
astrophysical signal from the residual systematic errors using the 
SPCER method and thus obtain the exoplanet emission spec- 
trum. 



3. Method 

The astrophysical signal from the exoplanet is present in all 
spectral channels and we have developed the SPCER method 
to identify this signal (i.e. the exoplanet eclipse) and separate 
it from other signal components which are not of astrophysi- 
cal origin and present in all spectral channels (i.e. systematic 
errors). The method is based on Principal Component Analysis 
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(PCA), which performs an orthogonal transformation that aligns 
the transformed axes in the directions of maximum variance. 
From statistical viewpoint, the eigenvectors of the covariance 
matrix of a dataset are the critical points of the Rayleigh quotient 
and when ranked according to the magnitude of the eigenvalues, 
they represent the axes of maximum variance of that data. To 
successfully separate non-random signals (systematic errors; the 
astrophysical signal) from random noise, it is desirable that the 
time series data has a high signal to noise (SNR) ratio. Therefore, 
we prefilter the time series prior to applying the SPCER method. 
After prefiltering, we perform singular value decomposition of 
a set of time series at different wavelengths. This conventiently 
separates the astrophysical signal from some of the larger sys- 
tematic errors present in the data. We then reconstruct the astro- 
physical signal dominated time series and measure the exoplanet 
emission spectrum. In this section, we outline the methodology. 
In Sect. 4, we discuss the results of applying this method to the 
HD 189733b dataset. 



3.1. prefiltering 

To clean the time series and at the same time retain the dynamic 
features of the data, we have implemented wavelet transform 
based adaptive signal extraction. Wavelets are time-localized be- 
cause the supports of wavelet functions are finite. This makes 
wavelets excellent for representing discrete events (e.g. abrupt 
changes linked to outlier points). The wavelet basis functions 
are constructed from a single function, termed the "Mother 
Wavelet", ifo.o- The time series is represented as a set of wavelet 
functions, ij/j^, constructed by a combination of dilation and 
translation of the mother function: iffjf = 2-" 2 ^o,o(2-'i - k). 
Here, we implemented the standard wavelet decomposition- 
thresholding-reconstruction procedu re which is widely used in 
signa l and image processing fields (|Sidnev et all 19981: iM allat 
1989, [19991 iDonohol 119951 iDonoho & Johnstond U995lT ^As 
mother wave let, we use the standard Daubechies 3 wavelet 
(Malla ll999l) . We obtain the wavelet decomposition by a multi- 
scale representation and using the transform coefficients a J k (k th 

scaling function at j th scale) and ft' k (k' h wavelet function at j th 
scale) defined as, 



j'+i 

+2k 



4 = f fWW - k)dt = J] h(n)a J n 
Pi = f fWo,o(2 J t ~ k)dt = J] S(n)a^ 2 



(1) 

(2) 



where the scaling function <p can be respresented as a signal 
with a low-pass s pectrum and expressed in terms of wavelets 
(see lMalialll989t) : /(f) is the time series for a specific wave- 
length. h(ri) and g(ri) represent low pass and high pass filter 
coefficients respectively and are obtained using Daubechies 3 
wavelets in MATLAB. For a detailed description on the theory 
of wavelet decompo sition, the reader is referred to lSidnev et alj 
JT998b : lMalla3(fT989b . To produce a smoother estimate and con- 
tinuous mapping during wavelet shrinkage, we implement a soft 
thresholding scheme, J]t(Wj) = sign(w/)(|w ; -| - T)+, where vvy 
is substituted by the different a! and /3 J k , along with the univer- 
sal threshold T = cr^ ^/2Tn{N), where N is the current level of 
wavelet decomposition and cr N is the approximation of noise 
at this level obtained using the robust median estimator. After 
thresholding, we reconstruct the time series using reconstruction 
filters and the iterative reconstruction method. Our wavelet based 
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Signal reconstructed using 2™ PC 
i.e. secondary eclipse signal of the exoplanet HD 189733b 
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Fig. 1. Separating the astrophysical signal from systematic er- 
rors. The different panels depict: top: an original, normalized 
and airmass corrected time series, middle: signal reconstructed 
using the first PC, bottom: signal reconstructed using the sec- 
ond PC. In the bottom panel, we show using a black dashed line 
the secondary eclipse lightcurve overplotted on the reconstructed 
data. The singular value decomposition of a set of time series at 
different wavelengths allows us to extract the exoplanet signa- 
ture, largely free of systematic errors. We show the data at the 
original, unbinned time sampling. 



signal extraction algorithm accepts signal F^(t) as input and re- 
turns the de-noised signal F™(t). As a result of this procedure, we 



get on average a 56 % improvement in the 
the individual time series measurements. 



standard deviation 
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3.2. separating multiple patterns in the data 

We start by averaging the time series FJ(f) in the spectral do- 
main (mean of 5 adjacent spectral channels) to reduce the num- 
ber of PCs and get the time series F"(t). Given X, a subset of 
F*(t) composed of P spectral channels, we center X by subtract- 
ing X, the mean of each column of X. Through singular value 
decomposition we find the eigenvalues {A = {A.\,Az,..., Ap}) and 
the eigenvectors (principal components C = \C\, C2 ■ ■ ■ Cp}) of 
the covariance matrix of X. We then project the centered data 

onto the principal component axes to get D\ pc , the representation 
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of X in the principal component space using 9\ pc -X®C where 
® denotes matrix multiplication. 

For our purposes, we want to decompose the signal present 
in the ensemble of time series into various components, e.g. flux 
variation due to secondary eclipse and Earth-atmospheric effects 
etc. For this, we transform each individual principal component 
C r (C r = C(r), r = 1,2, , . . , AO into the time domain as, 



5 r , v =l + 9V(r)®(C(r)) 7 



forr = 1,2, ...,N. 



(3) 



At this point, what we have done is transform our set of time 
series so that it is expressed in terms of patterns (principal com- 
ponents) that optimally represent the covariance (commonality) 
of X. Each S fti'pc is such a reconstructed pattern and it shows 
how the signal would have looked like if it was due to this single 
PC. We can now determine which PCs represent the eclipse and 
which represent confusing systematic errors. 



3.3. extracting the exoplanet spectrum 

To select the principal components corresponding to the eclipse, 
we calculate the linear correlation coefficient (CC) between each 
of the individually reconstructed signals S r th pc and the expected 
light curve shape. No prior knowledge on the depth o f the eclipse 
is req uired for this; only the exoplanet ephemerides dWinn et alj 
120071 for HD 189733) is needed. We select a set of principal 
components capturing the eclipse event using selection criteria 
described below and reconstruct the data using these compo- 
nents (Sr pc ). As a result of this reconstruction, we get an exo- 
planet eclipse light curve in which the effect of confusing system- 
atics is greatly reduced. For each wavelength channel in X, we 
determine the eclipse depth and the error bar based on the stan- 
dard deviation of the time series in-eclipse and out-of-eclipse. 
The eclipse depth for X is the average eclipse depth and the er- 
ror bar is composed of the error on the eclipse depth for each 
wavelength channel and the standard deviation of the depths for 
each wavelength channel in X: 



r eclipse depth ^ 



■) 2 + cr 



depths 



(4) 



with (r^ and cr out the standard deviation of the flux in-eclipse 
and out-eclipse regions for each wavelength channel in X; Ni n 
and N oM the number of observations in and out of eclipse; K the 
number of wavelength channels in X; cr depths the standard devia- 
tion of the eclipse depth for the different wavelength channels in 
X. 

The selection criteria to decide whether a principal compo- 
nent captures the exoplanet eclipse event are: 

1. \CC\ PC ,i > |CC|thre S hoid, with |CC|thre S hoid = mean(|CC|) + 
stddev(|CC|); the correlation coefficient of the PC must be 
large enough, such that it describes well the eclipse event. 

2. Rpc.i < ^threshold with Rpcj the eigenvalue rank of the i th 
selected PC and /^threshold is the cutoff value for rank. We have 
chosen ^threshold = 9 (see later). 

3 ■ \Rpc,i ~Rpc,i+ 1 1 < A thresho id ; A thresho i d is the cutoff value for the 
difference in the ranks of the selected PCs. We have chosen 
Athreshoid = 6 (see later). 

If no PC is found under these selection criteria, we assign an up- 
per limit to the eclipse depth, matching the lowest eclipse depth 
found in the dataset and conclude that we could not find the 
eclipse for those wavelength channels. 




Principal Component Number 

Fig. 2. Selecting the principal components which capture the ex- 
oplanet eclipse. The linear correlation coefficients between the 
PCs and the model eclipse light curve are shown for representa- 
tive wavelengths for the K and L-band. The eclipse is captured 
by PCs with a high correlation coefficient. The selection of PCs 
is rather straightforward with very large correlation coefficients 
being found at low PC numbers. 



4. Results and Discussion 

We have applied the method outlined above to the spectroscopic 
time series on the secondary eclipse of HD 189733b observed 
w ith the IRTF . We c hose to use the same spectral binning as 
in ISwain et al.l (120101) . such that we can easily compare the re- 
sults. In the K-band and L-band, we used 100 and 150 spectral 
channels respectively to construct X. We then processed the data 
through the different steps outlined above and a sample result 
of the principal component extraction is shown in Fig. Q] Blue 
symbols show the normalized original light curve which is dom- 
inated by residual systematic errors. Red and green curves show 
the signal reconstructed using the first and the second PC respec- 
tively. The model eclipse curve is shown in black. The first PC 
can be seen to represent much of the systematic errors present in 
the original signal, while the second PC is clearly the exoplanet 
eclipse. This demonstrates the potential of SPCER in separating 
systematic effects for ground based observations and in retriev- 
ing the exoplanet signal. It does so without requiring knowledge 
of the physical mechanism causing the corruption nor requiring 
prior knowledge on the exoplanet. 

To determine the exoplanet spectrum, we have selected S r pc 
using the criterium outlined above with /threshold = 9 and 
Athreshoid = 6. In Fig. |2l we illustrate the calculated correlation 
coefficients with the light curve for K and L-band wavelengths. 
It is clearly seen that a PC either matches really well (CC > 0.65) 
or not very well, making the selection rather straightforward and 
changes in the selection criterium have little influence. The re- 
sult of the procedure, the exoplanet spectrum of HD 189733b, 
is shown in Fig. [3] The SPCER metho d reproduces the mea- 
sure ments by HST dSwain et alJl2009ah . the Spitz er photome- 
try ( C harbonneau et al. . 2008) and those reported in dSwain et al.l 
2010) extremely well, within the error bars. 

A few considerations we would like to mention are: 



importance of pre-cleaning the data; removing the biggest 
errors first. The data processing outlined in ISwain et"a"D 
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Fig. 3. Dayside emission spectrum. The K-band (left) and L-band (right) HD 189733 b emission spectrum obtained using SPCER 
is shown usin g green squares and is compared to the space-based HST spectru m dSwain et al. 200 9a) and Spitzer broad-band 
photometry (Charbo nneau et ai]|2008l) and the ground-based spectrum reported in lSwain et alj ( 1201 Oft . We also show the Spitzer 
passband (blue line) for the 3.6 pm photometric point and have averaged the SPCER result over this passband to make comparison 
easy (blue open symbol). There is excellent agreement between the space and ground-based datasets. 



(2010) includes removal of systematic errors correlated in 
wavelength and airmass. If this step is not taken, it is still fea- 
sible to find the eclipse signal, but at lower eigenvalue rank. 
F or instance, not per forming the airmass correction outlined 
in lSwain et al.l(l2010t) . shifts the eclipse signal from a typical 
eigenvalue rank 2 to eigenvalue rank 4. 
importance of selecting (sometimes) multiple PCs. There is a 
lack of control in the choice of orthogonal basis functions in 
the singular value decomposition. This means that the eclipse 
is not always captured in a single PC, and several PCs (typi- 
cally two) may be needed to represent the eclipse. When the 
reconstruction is done based solely on one PC, the spectral 
shape is preserved, but the average eclipse depth can change 
by up to ~ 20%. 

possible improvement by post-processing the PCs. The 
SPCER method does not necessarily decompose the data en- 
tirely into an eclipse and non-eclipse component, because no 
priors are included (which is also the strength of the method). 
As such, some residual systematic error can be convolved 
with the eclipse signal. Simple post-processing, like fitting 
a polynomial to the out-of-eclipse part of the light curve, 
can potentially enhance the SNR of the exoplanet spectrum. 
For this particular dataset, this was not necessary, but might 
prove advantageous for other datasets. 

further improvements are likely possible by incorporating 
the known eclipse shape. In the method presented here, no 
prior knowledge of the eclipse shape is used during the 
disentanglement of systematic effects and eclipse signal. 
Methods that incorporate priors, such as an iterative matched 
filter, have the potential to improve the method presented 
here. 

equal signal to noise ratio for the different wavelength chan- 
nels is assumed when using PC A. For the dataset analyzed 
here, the different wavelength channels in each spectral bin 
X differ only marginally in count rate (the standard devia- 
tion of the normalized SNR is on average 4 % and is always 
less than 13 % for the different wavelength bins X) and reg- 
ular PCA is therefore appropriate. If this is not the case, 



a SysRem type of d own-weighting of w avelength channels 
with low count rates ( Ta muz et al.l 2005ft . is needed. 



In summary, the coupled wavelet denoising-SPCER method 
presented here is a new and powerful method for ground based 
exoplanet calibration. A useful aspect of the method lies in its 
ability to reject systematic errors without the need for knowledge 
of the underlying physical mechanism. This gives us the abil- 
ity to separate the original planetary signal from the relatively 
large systematic effects. The results of the SPCER-based calibra- 
tion for the emission spectrum of HD 1897 33b are in excellent 
agreement w ith those obtained with H ST dSwain et al. 2009b) 
and Spitzer (Charb onneau et ail 120081) and co nfirm the recent 
ground based measurements dSwain et al.l2010h . The strength of 
SPCER in retrieving the exoplanet spectrum from ground based 
observations is a proof of concept for its more general applica- 
tion in fields where signal deeply buried under noise must be 
extracted; the method has applications in a variety of fields in- 
cluding earth and atmospheric sciences, telecommunication sys- 
tems, measurement instruments, biomedical engineering, optics, 
image processing and controls, where problems of not having 
good signal to noise ratio before signal amplification are preva- 
lent or where pattern recognition is critical. 
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